I Released A Discord Bot That Describes Images And I Am Surprised It Works
I released something that might actually help people. It is called Blindbot. It is a Discord bot that uses Ollama to describe images for blind and visually impaired users. I am surprised it works. I am also surprised I finished it. Both surprises are valid.
Building accessibility tools feels different. The stakes feel higher. The margin for error feels smaller. I am not used to that feeling. I am learning to live with it.
What Blindbot Does
The bot watches for images in Discord channels. When someone posts an image it automatically generates a description. The description is formatted for screen readers. No markdown. No visual fluff. Just words that convey what the image contains.
It handles GIFs by extracting frames. It deduplicates near-identical frames using dHash and pixel difference scoring. It picks the most visually interesting frames to describe. This prevents the bot from describing the same frame five times in a row. That would be annoying. I avoided the annoyance.
It supports multiple formats. Images. GIFs. Videos in MP4, WebM, and MOV. PDFs. SVGs. I did not test every format thoroughly. I tested enough to feel confident. That is the CompactAI standard for release readiness.
Step 1: Install Ollama and pull a vision model
ollama pull gemma4:e4b
Step 2: Clone the repo and set up environment
git clone https://github.com/CompactAIOfficial/Blindbot.git
Step 3: Add your Discord bot token to .env
Step 4: Install dependencies and run
pip install -r requirements.txt
python bot.py
# If it crashes, that is on me. Report issues.
Per-Server Configuration
Each Discord server can configure Blindbot independently. Use slash commands to adjust settings. Set the detail level to brief, standard, or detailed. Choose how GIFs are processed. View cache statistics. Clear the cache if needed. The bot respects server preferences. It does not force a one-size-fits-all approach.
The feedback system lets users react with thumbs up or thumbs down to descriptions. Thumbs down removes the description from cache. This helps improve quality over time. It also gives users agency. Agency matters. Especially in accessibility tools.
Why I Built This
I train tiny models. I care about efficiency. I care about making technology accessible to more people. Blindbot aligns with those values. It uses a vision model to describe images. It formats output for screen readers. It reduces barriers for users who cannot see the images others post.
I also built it because I could. Because Ollama made vision models accessible to run locally. Because Discord has an API that lets bots listen for attachments. Because the pieces existed. I just connected them. That is the CompactAI way. Connect existing pieces. Hope they work. Document the process. Share the result.
Accessibility is not an afterthought. It is a design principle. I am learning that principle one bot at a time.
How You Can Use It
The repository is public. The code is open. You can clone it. You can run it. You can modify it. You can contribute improvements. That is how open source works. That is how tools get better. That is how communities grow.
Clone the repo. Read the README. Install dependencies. Run the bot. Tell me if it helps. Tell me if it breaks. Tell me how to make it better.
What Comes Next
I will monitor feedback. I will fix bugs as they appear. I will add features if the community requests them. I will keep the bot lightweight. I will keep the output screen-reader friendly. I will keep the code open.
I am also working on Chroma TTS. I am also training Glint variants. I am also building cAI-Grid. I am also managing four group chats. I am also sleeping occasionally. The workload is heavy. The progress is real. The caffeine supply is depleted.
Final Thoughts
Blindbot is live. It describes images. It formats output for screen readers. It handles GIFs without spamming duplicate descriptions. It respects server settings. It accepts feedback. It is open source. It is free. It is mine. It is yours.
If you run a Discord server, consider adding it. If you are blind or visually impaired, try it. If you are neither, try it anyway. Share your experience. Share your feedback. Share your improvements. That is how tools evolve. That is how communities thrive.
I am surprised it works. I am glad it exists. I am ready for the bug reports. Progress is weird. Accessibility is essential. Both can be true.